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PETRELS: Parallel Estimation and Tracking of 
Subspace by Recursive Least Squares from 

Partial Observations 
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Abstract 

Many real world data sets exhibit an embedding of low-dimensional structure in a high-dimensional 
manifold. Examples include images, videos and internet traffic data. It is of great significance to reduce 
the storage requirements and computational complexity when the data dimension is high. Therefore 
we consider the problem of reconstructing a data stream from a small subset of its entries, where 
the data is assumed to lie in a low-dimensional linear subspace, possibly corrupted by noise. We 
further consider tracking the change of the underlying subspace, which can be applied to applications 
such as video denoising, network monitoring and anomaly detection. Our problem can be viewed as a 
sequential low-rank matrix completion problem in which the subspace is learned in an on-line fashion. The 
proposed algorithm, dubbed Parallel Estimation and Tracking by REcursive Least Squares (PETRELS), 
first identifies the underlying low-dimensional subspace via a recursive procedure for each row of the 
subspace matrix in parallel with discounting for previous observations, and then reconstructs the missing 
entries via least-squares estimation if required. Numerical examples are provided for direction-of-arrival 
estimation and matrix completion, comparing PETRELS with state of the art batch algorithms. 
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subspace estimation and tracking, recursive least squares, matrix completion, partial observations, 
online algorithms 

I. Introduction 

Many real world data sets exhibit an embedding of low-dimensional structure in a high-dimensional 
manifold. When the embedding is assumed linear, the underlying low-dimensional structure becomes a 
linear subspace. Subspace Identification and Tracking (SIT) plays an important role in various signal 
processing tasks such as online identification of network anomalies lU, moving target localization [2], 
beamforming 131, and denoising Q. Conventional SIT algorithms collect full measurements of the data 
stream at each time, and subsequently update the subspace estimate by utilizing the track record of the 
stream history in different ways ll5]|. |[6ll. 

Recently advances in Compressed Sensing (CS) Q, lH and in the theory of Matrix Completion (MC) 
|[9l . ifTOl have made it possible to infer data structure from highly incomplete observations. Compared 
with CS, which allows reconstruction of a single vector from only a few attributes by assuming it is 
sparse in a pre-determined basis or dictionary, MC allows reconstruction of a matrix from a few entries 
by assuming it is low-rank. A popular method to perform MC is to minimize the nuclear norm of the 
underlying matrix lITOl which requires no prior knowledge of rank, in parallel with ii minimization 
for sparse recovery in CS. Other approaches include greedy algorithms such as OptSpace ifTTIl which 
requires an estimate of rank for initialization. Identifying the underlying low-rank structure in MC is 
equivalent to subspace identification in a batch setting. When the number of observed entries is slightly 
larger than the subspace rank, it has been shown that with high probability, it is possible to test whether 
a highly incomplete vector of interest lies in a known subspace fT2l. 

In high-dimensional problems, it might be expensive and even impossible to collect data from all 
dimensions. For example in wireless sensor networks, collecting from all sensors continuously will quickly 
drain the battery power. Ideally, we would prefer to only collect data from a fixed budget of sensors each 
time to increase the overall battery life, and still be able to identify the underlying structure. Another 
example is in onhne recommendation systems, where it is impossible to expect rating feedbacks from 
all users on every product. Therefore it is of growing interest to identify and track a low-dimensional 
subspace from highly incomplete information of a data stream in an online fashion. In this setting, the 
estimate of the subspace is updated and tracked across time when new observations become available 
with low computational cost. The GROUSE algorithm |[T3l has been proposed for SIT from online partial 
observations using rank-one updates of the estimated subspace on the Grassmannian manifold. However, 
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performance is limited by the existence of "barriers" in the search path |[T4l which result in GROUSE 
being trapped at a local minima. We demonstrate this behavior through numerical examples in Section |Vl] 
in the context of direction-of-arrival estimation. 

In this paper we further study the problem of SIT given partial observations from a data stream 
as in GROUSE. Our proposed algorithm is dubbed Parallel Estimation and Tracking by REcursive 
Least Squares (PETRELS). The underlying low-dimensional subspace is identified by minimizing the 
geometrically discounted sum of projection residuals on the observed entries per time index, via a recursive 
procedure with discounting for each row of the subspace matrix in parallel. The missing entries are then 
reconstructed via least-squares estimation if required. The discounting factor balances the algorithm's 
ability to capture long term behavior and changes to that behavior to improve adaptivity. We also benefit 
from the fact that our optimization of the estimated subspace is on all the possible low-rank subspaces, not 
restricted to the Grassmannian manifold. We prove the convergence properties of PETRELS by revealing 
its connection with the well-known Projection Approximation Subspace Tracking (PAST) algorithm |l5ll 
in the full observation scenario. Finally, we provide numerical examples to measure the impact of the 
discount factor, estimated rank and number of observed entries. In the context of direction-of-arrival 
estimation we demonstrate superior performance of PETRELS over GROUSE in terms of separating 
close-located modes and tracking the changes in the scene. We also compare PETRELS with state of 
the art batch matrix completion algorithms, showing it as a competitive alternative when the subspace is 
fixed. 

The rest of the paper is organized as follows. Section |ll] states the problem and provides background in 
the context of matrix completion and conventional subspace tracking. Section |lll] describes the algorithm 
in detail. Two extensions of PETRELS to improve robustness and reduce complexity are presented in 
Section |IVl We discuss convergence issues of PETRELS in Section |Vl Section |Vl] shows the numerical 
results and we conclude the paper in Section |VII[ 

II. Problem Statement and Related Work 

A. Problem Statement 

We consider the following problem. At each time t, a vector € M^^ is generated as: 

= Ufa* + G M^, (1) 

where the columns of € M^^^*"* span a low-dimensional subspace, the vector € M^' specifies the 
linear combination of columns and is Gaussian distributed as a^ ~ A/'(0,Ir.J, and is additive white 
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Gaussian noise distributed as ~ A/^(0, ct^Im)- The rank of the underiying subspace rt is not assumed 
known exactly and can be slowly changing over time. The entries in the vectors xt can be considered 
as measurements from different sensors in a sensor network, or values of different pixels from a video 

frame. In practice, the upper bound of the subspace rank is considered known such that rt < r for any t. 
We assume only partial entries of the full vector xj are observed, given by 

= pt G xt = Ptxt G (2) 

where denotes point-wise multiplication, = diag[pt], = \pit,P2tr-- ,PMtY S {0,1}^ with 
Pmt = 1 if the mth entry is observed at time t. We denote fit = {m : Pmt = 1} as the set of observed 
entries at time t. In a random observation model, we assume the measurements are taken uniformly at 
random. 

We are interested in an online estimate of a low-rank subspace D„ € M^^^ at each time index n, 
which identifies and tracks the changes in the underlying subspace, from streaming partial observations 
{jt, Vt)t=i- The rank of the estimated subspace D„ is assumed known and fixed throughout the algorithm 
as r. In practice, we assume is the upper bound of the rank of the underlying subspace. The desired 
properties for the algorithm include: 

• Low complexity: each step of the online algorithm at time index n should be adaptive with small 
complexity compared to running a batch algorithm using history data; 

• Small storage: The online algorithm should require a storage size that does not grow with the data 
size; 

• Convergence: The subspace sequence generated by the online algorithm should converge to the true 
subspace U = U„ if it is constant under the assumption that the sampled covariance matrix of xj 
converges. 

• Adaptivity: The online algorithm should be able to track the changes of the underlying subspace in 
a timely fashion. 

B. Conventional Subspace Identification and Tracking 

When x^'s are fully observed, our problem is equivalent to the classical SIT problem, which is widely 
studied and has a rich literature in the signal processing community. Here we describe the Projection 
Approximation Subspace Tracking (PAST) algorithm in detail which is the closest to our proposed 
algorithm in the conventional scenario. First, consider optimizing the scalar function with respect to 
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a subspace W € M^^^'', given by 

J(W) =E||xt-WW^xt||i, (3) 

When Uj = U is fixed over time, let Cx = E[xfxJ^] = UU^ + ct'^Im be the data co variance matrix. It 
is shown in Q that the global minima of Q is the only stable stationary point, which is obtained by 
W = UrQ, where is composed of the r dominant eigenvectors of Cx, and Q G C"^*" is a unitary 
matrix. When the data is noise-free, we can choose = U. This motivates PAST to optimize the 
following function at time n without constraining W to have orthogonal columns: 

n 

W„ = argmin V a""' ||xt - WW^Xt||^, (4) 

n 

« argmin V a^-'\\^t - WWl_^^t\\l (5) 

where the expectation in (|3]l is replaced by geometrically weighting the previous observations by a in 
(|4]), which is further approximated by replacing the second W by its previous estimate in (|5]l. Based 
on the subspace W„ can be found by first estimating the coefficient using the previous subspace 
estimate as a„ = W^_]^x„, then update the subspace as 

n 

W„ = argmin ^a^'^Hxj - Wai||2. (6) 

Now let a = 1 and R„ = X]j"=i ^^n^^- In ifTSl . the asymptotic dynamics of the PAST algorithm is 
described by the Ordinary Differential Equation (ODE) below and its equilibrium as t goes to infinity: 

R = E[a„a^] - R = W^CxW - R, 
W = E[x„(x„ - Wa„)^]Rt = (I - WW^)CxWRt, 

where a„ = W^x„, R = R(t) and W = W(t) are continuous time versions of R„ and W„, and f 
denotes pseudo-inverse. It is proved in fl5l that as t increases, W(i) converges to the global optima, i.e. 
to a matrix which spans the eigenvectors of Cx corresponding to the r largest eigenvalues. In Section IVl 
we show that our proposed PETRELS algorithm becomes essentially equivalent to the PAST algorithm 
when all entries of the data stream are observed, and can be shown to converge globally. 

In particular, the PAST algorithm belongs to the class of power-based methods, together with the Oja 
method ||T6l, the Novel Information Criterion (NIC) method ||T7l and their variations. These methods 
are treated under a unified framework in ifTSl . The estimate of the low -rank subspace W„ G R*^^*^' is 
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updated at time n as 



1/2 



(7) 



where C„ is the sample data covariance matrix updated from 



C 




(8) 



where a„ is a parameter between and 1. The normalization in ([7]) assures that the updated subspace 
W„ is orthogonal but this normalization is not performed strictly in different algorithms. 

It is shown in [18] that these power-based methods guarantee global convergence to the principal 
subspace spanned by eigenvectors corresponding to the r largest eigenvalues of Cx- If the entries of 
the data vector x^'s are fully observed, then converges to Cx very fast, and this is exactly why the 
power-based methods perform very well in practice. When the data is highly incomplete, the convergence 
of dHJl is very slow since only a small fraction |r2„p/n^ of entries in C„_i are updated, where \Q,n\ is 
the number of observed entries at time n, making direct adoption of the above method unrealistic in the 
partial observation scenario. 

C. Matrix Completion 

When only partial observations are available and Ut = U are fixed, our problem is closely related to 
the Matrix Completion (MC) problem, which has been extensively studied recently. Assume X G ]^A^x" 
is a low-rank matrix, P is an M x n mask matrix with at missing entries and 1 at observed entries. 
Let Y = P X be the observed partial matrix, where denotes point-wise multiplication. Matrix 
completion aims to solve the following problem: 



i.e. to find a low-rank matrix such that the observed entries are satisfied. This problem is combinatorially 
intractable due to the rank constraint. It has been shown in 121 that by replacing the rank constraint with 
nuclear norm minimization, ^ can be replaced by a convex optimization problem, given the following 
spectral-regularized MC problem: 



where ||Z||* is the nuclear norm of Z, i.e. the sum of singular values of Z, and > is a regularization 
parameter. Interestingly, the solution of (ITOl) is the same as that of i.e. provides exact recovery for 



min rank(Z) s.t. P (X - Z) = 



(9) 



min -||P0(X-Z)||^ + /i||Z 



(10) 
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matrix completion under mild conditions 191. The nuclear norm |fT9ll of Z is given by 



1 

mm — 

U,V:Z=UVr 2 



(11) 



where U € C^^^** and V G C"^^. Substituting ([TTll in CO]) we can rewrite the MC problem as 



Our problem formulation can be viewed as an online way of solving the above batch-setting MC 
problem, where the columns of X are drawn randomly and treated as a new measurement at each 
time index with a fixed underlying subspace \Jt = U. The online algorithm has potential advantages for 
adapting changes in matrix size and avoiding large matrix manipulations. We will compare the PETRELS 
algorithm against some of the popular MC algorithms proposed recently in Section |Vll 

D. The GROUSE Algorithm 

The GROUSE algorithm |fT3l proposed by Balzano et. al. addresses the same problem of online 
identification of low-rank subspace from highly incomplete information. It aims to minimize the projection 
residual on observed entries with respect to the underlying subspace at each time n by using previous 
estimate D„_i as a warm start on the Grassmannian manifold 



The resulting algorithm is a fast rank-one update on D„_i at each time n; convergence to a stationary 
point but not global optimal is guaranteed under mild conditions on the step-size. However, performance 
is limited by the existence of "barriers" in the search path |[T4l which result in GROUSE being trapped at a 
local minima. We will compare against GROUSE with more details in Section ITlI-CI after the formulation 
of our method and provide numerical comparisons in Section |Vll 

III. The petrels Algorithm 

We now describe our proposed Parallel Estimation and Tracking by REcursive Least Squares (PE- 
TRELS) algorithm. 




(12) 



= {D G M^^'^ : D^D = IJ. 
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A. Objective Function 

We first define the function /t(D) at each time t = 1, • • • , n for a fixed subspace D S M^^^**, which 
is the total projection residual on the observed entries, 

/^(D) = min|[Pt(xt-Dat)||2, t = l,-- - ,n. (13) 

at 

Here r is the rank of the estimated subspace, which is assumed known and fixed throughout the algorithrrQ. 
We aim to minimize the following loss function at each time n with respect to the underlying subspace: 

n 

T>n = argmin F„(D) = argmin V A"-*/t(D), (14) 

where D„ is the estimated subspace of rank r at time n, and the parameter <C A < 1 discounts past 
observations. 

To motivate the loss function in ([141 ) we note that if Ut = U is not changing over time, then the RHS 
of ([T4l) is minimized to zero when D„ spans the subspace defined by U. If is slowly changing, then 
A is used to control the memory of the system and maintain tracking ability at time n. For example, by 
using A ^ 1 the algorithm gradually loses its ability to forget the past. 

Fixing D, ft{D) can be written as 

/t(D) = (Pt - PtD(D'^P4D)tD'^Pt) xt, (15) 

where j denotes matrix pseudo-inverse. Plugging this back to (fT4l ) the exact optimization problem 
becomes: 

n 

D„ = argmin V A"-*xf (p^ - PtJy{B^PtJ^)^B^Pt) ^t, 

which is difficult to solve over D and requires storing all previous observations. Instead, we propose 
PETRELS to approximately solve this optimization problem. 

Before developing PETRELS we note that if there are further constraints on the coefficients aj's, a 
regularization term can be incorporated as: 

/j(D) = min ||Pt(Dat-xt)||i + /3||aj|lp, (16) 

where p > 0. For example, p = 1 enforces a sparse constraint on a^, and p = 2 enforces a norm constraint 
on Sif 

'The rank may not equal the true subspace dimension. 
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In (fT4l ) the discount factor A is fixed, and the influence of past estimates decreases geometrically; a 
more general online objective function can be given as 

F„(D) = A„F„_i(D) + /„(D), (17) 

where the sequence {A„} is used to control the memory and adaptivity of the system in a more flexible 
way. 

Algorithm 1 PETRELS for SIT from Partial Observations 
Input: a stream of vectors yj and observed pattern P^. 

Initialization: an M x r random matrix Do, and (Rm)^ = ^^r, (5 > for all m = 1, • • • , M. 
1: for n = 1, 2, • • • do 

2: a„ = (D^_iP„D„_i)tD^_iy„. 

4: for m = 1, • • • , M do 

5: /3« =l + A-ia^(R;r')ta„, 
v« =A~i(Rr')W, 

(R:^)t = A-i(Rr')^ +pMi3^;kr'^zi^if, 

fin — fin-1 , ( — a^rl"~lVR" "ita 

9: end for 
10: end for 



B. PETRELS 

The proposed PETRELS algorithm, as summarized by Algorithm \T\ alternates between coefficient 
estimation and subspace update at each time n. In particular, the coefficient vector a„ is estimated by 
minimizing the projection residual on the previous subspace estimate D„_i: 

a„ = argmin ||Pn(xn - D„_ia)||| 

= (DLiP„D„_i)tDLiy„, (18) 

where Dq is a random subspace initialization. The full vector x„ can then be estimated subsquently as: 

x„ = D„_ia„, (19) 

and the residual error is given as 
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The subspace D„ is then updated by minimizing 

n 

D„ = argmin V A"-*||Pt(xt - T>3.t)\\l (20) 

D t=l 

where a^, t = I,-- - ,n are estimates from (fTSl ). Comparing (l20l) with (fT4b . the optimal coefficients 
are substituted for the previous estimated coefficients. This results in a simpler problem for finding 
D„. The discount factor mitigates the error propagation and compensates for the fact that we used the 
previous coefficients updated rather than solving ([141) directly, therefore improving the performance of 
the algorithm. 

The objective function in (l20l ) can be equivalently decomposed into a set of smaller problems for each 
rowof D„ = [d^,d^,.-. ,dl,f as 

n 

d;^ = argmin ^ X^'^Pmt {xmt- afdmf, (21) 

for m = 1, • • • , M. To find the optimal dj^, we equate the derivative of the RHS of (|2TI ) to zero, resulting 
in 



A" Wataf d;; - ^ A" ^PmtXmtSit = 0. 



\t=i / t=i 

which can be rewritten as 

d" = s" (22) 
where R;^ = YTt=\ A'^'VmtataJ', and s^^ = YTt=\ X"'~^PmtXmtat. Therefore, can be found as 

= (R;j,)ts;^. (23) 



When RJ^ is not invertible, this finds the least-norm solution to dj^. Now we show how (l22l ) can be 
updated recursively. First we rewrite 

■^m — "^-f^m Pmn^n^n ) (24) 

— "^^m ~^ PmnXmn^ri': (25) 
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for all m = 1, • • • , M. Then we plug U3 and UB into (122]|, and get 



Rn jn _ \„n-l , 
m m ^ m ' ymn-^mn^-n 



^■"■^m '-'■m I l-'mri'^mn'^n 



_ -ryn in-1 _ n'^H""^ -I- « t- a 

— ■'■''m'-'-m /^mn"-n"-n '-'m ^ ymrf^mn'^n 

— -^m^m ~^ Pmni^mn ^n^m (26) 

where d"~^ is the the row estimate in the previous time n — 1. This results in a parallel procedure to 
update all rows of the subspace matrix D^, give as 

^m ~ ^m ~^ Pmn{Xmn ^n^m )(-f^m)^^n.' (27) 

Finally, by the Recursive Least-Squares (RLS) updating formula for the general pseudo-inverse matrix 
EOl . 1211 > (R-m)^ '^an be easily updated without matrix inversion using 

= A-i(R;r')HpmtG;^; (28) 
where G^, = H^Vl'^^mi^mV ' with and v^J, given as 

v;^ = A-i(Rr')ta„. 

To enable the RLS procedure, the matrix (R^)^ is initialized as a matrix with large entries on the 
diagonal, which we choose arbitrarily as the identity matrix (Rj^^)^ = 51r, 5 > for all m = 1, • • • , M. 
It is worth-noting that the fast implementation of RLS updating rules is in general very efficient. However, 
cautions need to be taken since direct application of fast RLS algorithms suffer from numerical instability 
of finite-precision operations when running for a long time |[22l . 



C. Comparison with GROUSE 

The GROUSE method can be viewed as optimizing (fT4b for A = 1 at each time n using a first-order 
stochastic gradient descent on the orthogonal Grassmannian defined as Gr = {D € ]^Afxr . _ j^j 
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instead of M*^^''. Thus, in the GROUSE algorithm, 

n 

D„ = argmin G„ (D) = argmin ft (D) . (29) 
GROUSE updates the subspace estimate along the direction of V/((D)|d=d„_i on Qr, given by 

D„ = D„_i - [(cos(o-??„) - 1) 11^^*11 + 

Iktlb ||an||2 

where a = ||xt||2||rf ||2, and r/„ is the step-size at time n. At each step GROUSE also alternates between 
coefficient estimation ([TSl l and subspace update (l30b . If the step size satisfies 



lim r/„ = and ^ r]t = oo, 

then GROUSE is guaranteed to converge to a stationary point of G„(D). However, due to the existence 
of "barriers" in the search path on the Grassmannian llT4l . GROUSE may be trapped at a local minima 
as shown in Section |Vl] in the example of direction-of-arrival estimation. 

D. Second-Order Stochastic Gradient Descent 

The PETRELS algorithm can be regarded as a second-order stochastic gradient descent method to 
solve ([141 ) by using dj^"^, m = 1, • • • , M as a warm start at time n. Specifically, we can write the 
gradient of /n(D) in ( [T3l ) at D„_i as 

dfn(D) 



2P„(x„ - D„_ia„)a^, (31) 



D=D„ 



where is given in (fTSj) . Then the gradient of F„(D) at D„_i is given as 

5f„(d; 



D=D„_. 



Therefore the Hessian for each row of D at d""^ is 



2j]A"-*Pt(xt-D„_iat)a; 



g^i^n(D) 



2 J]A"-W«ataf. (32) 



t=i 
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Hence the updating rule for each row dm given in (1271) can be written as 

jn _ jn-1 TT (/\n-l 

dm - dm --tin (dm ^ ^) ^ ,n~l ' 

which is equivalent to second-order stochastic gradient descent approach. Compared with first-order 
algorithms, PETRELS enjoys a faster convergence speed to the stationary point of F„(D) and gets rid 
of the problem of tuning the step-size l23l . 

Remark: If we relax the objective function of GROUSE (|29l ) to all rank-r subspace M^^*", given as 

n 

Y)n = argmin /t(D), 

which is equivalent to PETRELS without discounting. Therefore following the discussions above, it is 
possible to use a second-order stochastic gradient descent method to update the underlying subspace, 
yielding the update rule for each row of D„ as 

d^m = - 7nH„(dr\ A = ly'^i^, (34) 

where 7„ is the step-size at time n. Compared with the update rule for PETRELS in (|33] ). the dis- 
count parameter has a similar role as the step-size, but weights the contribution of previous data input 
geometrically. However, in this paper we didn't investigate through the performance of (l34l) . 

IV. Extensions of the PETRELS Algorithm 

A. Simplified PETRELS 

In the subspace update step of PETRELS in ( [20l ). consider replacing the objective function in ([141 ) by 

D„ = argmin F„(D) 

D 

n 

= argmin VA""*||xt -Datll^, (35) 

where and xt, t = 1, • • • ,n are estimates from earlier steps in ( fTSl ) and ( fT9l ). The only change we 
made is to remove the partial observation operator from the objective function, and replace it by the full 
vector estimate. It remains true that d^ = argmin^j ^ F„(dm) = d^^^ if the con^esponding mth entry of 
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Xn is unobserved, i.e. m ^ since 



n-l 

Fn{dm) = - d^a^lli + ||(d;;-i - d„)'^at||2, 

t=i 

= AF„-i(d„) + ||(dr' - d^fatWl 
>AF„_i(d«~^) = F4dr') 

is minimized when dm = d^~^ for m ^ 0„. 

This modification is equivalent to the original PETRELS in the full observation scenario, but generally 
leads to a simplified updating rule for RJ^, since now the updating formula for all rows d^'s is the 
same, where = R„ = ARn_i + a„a^ for all m. The row updating formula (|27] | is replaced by 

D„ = D„_i + Pn(x„ - D„_ia„)a^R],, (36) 

which further saves storage requirement for the PETRELS algorithm from 0{Mr'^), to store all R^'s, 
to ©(r^). We compare the modified version with PETRELS in terms of performance in Section IVTl 

B. Incorporating Prior Information 

It is possible to incorporate regularization terms into PETRELS to encode prior information about the 
data stream. Here we outline the regularization on the subspace D in the subspace update step, such that 
at each time n, D„ is updated via 

n 

D„ = argmin V A"-*||Pt(xt - Dat)||l + Mn||D|||, (37) 
D t=i 

where ^„ > is the regularization parameter. Similarly to PETRELS, it can be decomposed for each 
row of D = [di,d2, • • • ,dM]^ as 

n 

dm = argmin^ A"'"*pmt(xmt - a^dmf + lJin\\dm\\l 

= fT" )"^s" 
where TJ^ can be updated as 
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and sj^ can be updated as (|25] ). However the fast RLS algorithm no longer applies here, so additional 
complexity for matrix inversion is required. It is worth noticing that (l37l) closely resembles the matrix 
completion formula ([T2l) when V is fixed and composed of columns of a^'s. 

V. Convergence Analysis in the Full Observation Scenario 

In the full observation regime, i.e. y„ = x„ for all n, the PETRELS algorithm becomes essentially 
equivalent to the PAST algorithm Q for SIT, except that the coefficient is estimated as a„ = D^_^y„ = 
D^_ix„ in the PAST algorithm versus a„ = (D^_iD„_i)~iD^_iX„ in PETRELS. Now let A = 1. 
Similar to PAST in |[T5l . the asymptotic dynamics of the PETRELS algorithm can be described by the 
ODE below, 

R = E[a„a^] - R 

= (D^D)-iD^CxD(D^D)-^ - R, (38) 
D = E[x„(x„-Da„f]Rt 

= (I - D(D'^D)"^D^)CxD(D^D)"^R"^ (39) 

where a„ = (D^D)^^D^x„, R = R(t) and D = D(t) are continuous-time versions of R„ and D„. 
Now let D = D(D'^D)-i/2 ^nd R = (D'^D)1/2r(d^D)1/^ since from (HHl 

D^D = D^(I - D(D^D)-1D^)CxD(D^D)^R^ = 0, 

we have ^(D'^D) = D'^D + D^D = and further ^/(D'^D) = for any function of D^D. Hence, 

D = D(D^D)-V2 + D-(D^D)-V2 = D(D^D)-i/2, (40) 

dt 

i = ^(D^D)-i/2r(D^D)1/2 + (D^D)V2r(d^D)1/2 

+ (D^D)1/2r1(d^D)1/2 = (D^D)V2r(d^D)V2. (41) 

Therefore (l38l) and (|39l ) can be rewritten as 

R = D^CxD - R, 

D = (I-DD^)CxDR^ 

which is equivalent to the ODE of PAST. Hence we conclude that PETRELS will converge to the global 
optima in the same dynamic as the PAST algorithm. 
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VI. Numerical Results 

Our numerical results fall into four parts. First we examine the influence of parameters specified in 
the PETRELS algorithm, such as discount factor, rank estimation, and its robustness to noise level. Next 
we look at the problem of direction-of-arrival estimation and show the proposed PETRELS algorithm 
demonstrates performance superior to GROUSE by identifying and tracking all the targets almost perfectly 
even in low SNR. Thirdly, we compare our approach with matrix completion, and show that PETRELS 
is at least competitive with state of the art batch algorithms. Finally, we provide numerical simulations 
for the extensions of the PETRELS algorithm. 

A. Choice of Parameters 

At each time t, a vector is generated as 

Xi = Dt,.„eat + nt,f = 1,2, • • • (42) 

where T)true is an r-dimensional subspace generated with i.i.d. J\f{0, 1) entries, is an r x 1 vector with 
i.i.d. M{0, 1) entries, and is an m, x 1 Gaussian noise vector with i.i.d. AA(0,e^) entries. We further 
fix the signal dimension m = 500 and the subspace rank rtme = 10. We assume that a fixed number of 
entries in x^, denoted by K, are revealed each time. This restriction is not necessary for the algorithm 
to work as shown in matrix completion simulations, but we make it here in order to get a meaningful 
estimate of a^. Denoting the estimated subspace by D, we use the normalized subspace reconstruction 
error to examine the algorithm performance; this is calculated as ||'Pf)^D(rMe|lF/||DtrMe|||'> where "Pj^,^ 
is the projector for the orthogonal subspace . 

The choice of discount factor A plays an important role in how fast the algorithm converges. With 
K = 50, a mere 10% percent of the full dimension, the rank is estimated accurately as r = 10 in a 
noise-free setting where e = 0. We run the algorithm to time n = 2000 for the same data, the normalize 
subspace reconstruction error is minimized when A is around 0.98 in Fig.[T] Hence, we will keep A = 0.98 
hereafter. 

In reality it is almost impossible to accurately estimate the intrinsic rank in advance. Fortunately the 
convergence rate of our algorithm degrades gracefully as the rank estimation error increases. In Fig. |2j 
the evolution of normalized subspace error is plotted against data stream index, for rank estimation 
r = 10, 12, 14, 16, 18. We only examine the rank over-estimation case here since this is usually the 
case in applications, and the readers are referred to the next section for examples for the case of rank 
underestimation. 
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forgetting parameter X 



Fig. 1. The normalized subspace reconstruction error as a function of the discount factor A after running the algorithm to time 
n = 2000 when 50 out of 500 entries of the signal are observed each time without noise. 




200 400 600 800 1000 1200 1400 1600 1800 2000 
data stream index 



Fig. 2. The normalized subspace reconstruction error as a function of data stream index when the rank is over-estimated when 
50 out of 500 entries of the signal are observed each time without noise. 

Taking more measurements per time leads to faster convergence since it is approaching the full 
information regime, as shown in Fig. |3] Theoretically it requires M ~ 0{r\ogr) ^ 23 measurements to 
test if an incomplete vector is within a subspace of rank r ifTll . and the simulation shows our algorithm 
can work even when M is close to this lower bound. 

Finally the robustness of the algorithm is tested against the noise variance in Fig. HI where the 
normalized subspace error is plotted against data stream index for different noise level e . Since the 
proposed PETRELS algorithm is a second-order algorithm as articulated in Section ITlI-DI the normalized 
subspace error converges to the error floor determined by the noise variance, (mark: I'm not sure if there's 
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Fig. 3. The normalized subspace reconstruction error as a function of data stream index when the number of entries observe 
per time M out of 500 entries are varied with accurate rank estimation and no noise. 



a close-form formula for this). 




data stream index 



Fig. 4. The normalized subspace error against data stream index with different noise level e when 50 out of 500 entries of the 
signal are observed each time with accurate rank estimation r = 10. 



We now consider a scenario where a subspace of rank r = 10 changes abruptly at time index n = 3000 
and n = 5000, and examine the performance of GROUSE |fT3l| and PETRELS in Fig. |5] when the rank is 
over-estimated by 4 and the noise level is e = 10^'^. The normalized residual error for data stream and 
normalized subspace error are shown respectively in Fig. |5] (a) and (b). Both PETRELS and GROUSE 
can successfully track the changed subspace, but PETRELS can track the change faster. 
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(a) Normalized residual error (b) Normalized subspace error 

Fig. 5. The normalized subspace error when the underlying subspace is changing with fixed rank r = 10. The rank is over- 
estimated by 4 and the noise level is e = 10~^, when 50 out of 500 entries of the signal are observed each time for both 
GROUSE and PETRELS. 



B. Direction-Of-Arrival Analysis 

Given GROUSE |[T3l as a baseline, we evaluate the resilience of our algorithm to different data models 
and applications. We use the following example of Direction-Of-Arrival analysis in array processing to 
compare the performance of these two methods. Assume there are n = 256 sensors from a linear array, 
and the measurements from all sensors at time t are given as 

xt = VI]at + nt,t = 1,2,--- (43) 

where V € C"^p is a Vandermonde matrix given as 

V = [Qi(a;i),--- ,Qp(a;p)], (44) 

where a^{uji) = [1, e^'^'^"- , • • • , e-''2'^'^'("-i)l]^, < < 1; S = diag{d} = diag{(ii,--- ,dp} is a 
diagonal matrix which characterizes the amplitudes of each mode. The coefficients at are generated with 
A/'(0, 1) entries, and the noise is generated with 7V(0, e^) entries, where e = 0.1. 

Each time we collect measurements from = 30 random sensors. We are interested in identifying 
all {oJiYi=i {diYi=i- "^^^^ can be done by applying the well-known ESPRIT algorithm ll24l to the 
estimated subspace D of rank r, where r is specified a-priori corresponding to the number of modes to 
be estimated. Specifically, if Di = D(l : n — 1) and D2 = D(2 : n) are the first and the last n — 1 
rows of D, then from the eigenvalues of the matrix T = d|D2, denoted by Aj, i = 1, • • • , r, the set of 
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Wi}^=i be recovered as 

uji = argXi, i = !,■■■ , r. (45) 

ZTT 

The ESPRIT algorithm also plays a crucial role in recovery of the multi-path delays from low-rate samples 
of the channel output from transmitting pulse streams with known shape ll25l . 

In a dynamic setting when the underlying subspace is varying, the proposed PETRELS algorithm does 
a better job of discarding out-of-date modes and picking up new ones. We divide the running time into 
4 parts, and the frequencies and amplitudes are specified as follows: 

1) Start with the same frequencies 

UJ = [0.1769, 0.1992, 0.2116, 0.6776, 0.7599]; 

and amplitudes 

d= [0.3, 0.8, 0.5, 1, 0.1]. 

2) Change two modes (only frequencies) at stream index 1000: 

a;= [0.1769, 0.1992, 0.4116, 0.6776, 0.8599]; 

and amplitudes 

d= [0.3, 0.8, 0.5, 1, 0.1]. 

3) Add one new mode at stream index 2000: 

UJ = [0.1769,0.1992,0.4116,0.6776,0.8599,0.9513]; 

and amplitudes 

d = [0.3, 0.8, 0.5, 1, 0.1,0.6]. 

4) Delete the weakest mode at stream index 3000: 

UJ = [0.1769, 0.1992, 0.4116, 0.6776, 0.9513]; 

and amplitudes 

d= [0.3, 0.8, 0.5, 1, 0.6]. 
Fig. |6] shows the ground truth of mode locations and amplitudes for the scenario above. Note that there 
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are three closely located modes and one weak mode in the beginning, which makes the task challenging. 
We compare the performance of the proposed PETRELS algorithm and GROUSE. The rank specified in 
both algorithms is r = 10, which is the estimated number of modes; in our case it is twice the number 
of true modes, and the estimated directions at each time for 10 modes are shown against the data stream 
index in Fig. |7] The color shows the amplitude corresponding to the color bar. The proposed PETRELS 
algorithm identifies all modes correctly. In particular it distinguishes the three closely-spaced modes 
perfectly in the beginning, and identifies the appearance and disappearance of the later weak mode. 

The auxiliary modes are exhibited as "noise" in the scatter plot. With GROUSE the closely spaced 
nodes are erroneously estimated as one mode, the weak mode is missing, and spurious modes have been 
introduced. The PETRELS algorithm also fully tracked the later changes in accordance with the entrance 
and exit of each mode, while GROUSE is not able to react to changes in the data model. 



0.4 - 
0.3 - 
0.2 H 
0.1 - 



I 



500 1000 1500 2000 2500 3000 3500 4000 
data stream index 



Fig. 6. Ground truth of the actual mode locations and amplitudes in a dynamic scenario. 



C. Matrix Completion 

We compare performance of the proposed PETRELS algorithm on matrix completion against batch al- 
gorithms LMaFit EH, FPCA lETl, Singular Value Thresholding (SVT) lH, OptSpace lEl and GROUSE 
|[T3l . The low -rank matrix is generated from a matrix factorization model with X = UV^ G j^iooox2000^ 
where U € Mioooxio ^j^^ y g ]^2000xio^ entries in U and V are generated from standard normal 
distribution AA(0, 1) (Gaussian data) or uniform distribution Zi[0, 1] (uniform data). The sampling rate is 
taken to be 0.05, so only 5% of all entries are revealed. 

The running time is plotted against the normalized matrix reconstruction error, calculated as ||X — 
X||i;'/||X||i7', where X is the reconstructed low -rank matrix for Gaussian data and uniform data re- 
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(a) PETRELS (b) GROUSE 

Fig. 7. Tracking of mode changes in direction-of-arrival estimation using PETRELS and GROUSE algorithms: the estimated 
directions at each time for 10 modes are shown against the data stream. All changes are identified and tracked successfully by 
PETRELS, but not by GROUSE. 



spectively in Fig. [8] (a) and (b). The proposed PETRELS algorithm matches the performance of batch 
algorithms on Gaussian data and improves upon the accuracy of most algorithms on uniform data, where 
the Grassmaniann-based optimization approach may encounter "barriers" for its convergence. Note that 
different algorithms have different input parameter requirements. For example, OptSpace needs to specify 
the tolerance to terminate the iterations, which directly decides the trade-off between accuracy and running 
time; PETRELS and GROUSE require an intial estimate of the rank. Our simulation here only shows 
one particular realization and we simply conclude that PETRELS is competitive. 
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Fig. 8. Comparison of matrix completion algorithms in terms of speed and accuracy: PETRELS is a competitive alternative 
for matrix completion tasks. 
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D. Simplified PETRELS 

Under the same simulation setup as for Fig. |2] except that the subspace of rank 10 is generated 
by Titrue = D^j-ueS, where 5] is a diagonal matrix with 5 entries from 7\A(0, 1) and 5 entries from 
0.01 • AA(0, 1), we examine the performance of the simplified PETRELS algorithm (with A = 0.9) in 
Section |IV] A and the original PETRELS (with A = 0.98) algorithm when the rank of the subspace is 
over-estimated as 12 or under-estimated as 8. When the rank of D is over-estimated, the change in ^ will 
introduce more errors and converges slower compared with the original PETRELS algorithm; however, 
when the rank of D is under-estimated, the simplified PETRELS performs better than PETRELS. 




10 I • • • • • • • • • 1 

200 400 600 800 1000 1200 1400 1600 1800 2000 
data stream index 



Fig. 9. The normalized subspace reconstruction error against data stream index when the rank is over-estimated as 12 or 
under-estimated as 8 for the original PETRELS and modified algorithm. 

VIL Conclusions 

We considered the problem of reconstructing a data stream from a small subset of its entries, where 
the data stream is assumed to lie in a low-dimensional linear subspace, possibly corrupted by noise. This 
has significant implications for lessening the storage burden and reducing complexity, as well as tracking 
the changes for applications such as image denoising, network monitoring and anomaly detection when 
the problem size is large. The notorious low-rank matrix completion problem can be viewed as a batch 
version of our problem. The PETRELS algorithm first identifies the underlying low-dimensional subspace 
via a discounted recursive procedure for each row of the subspace matrix in parallel, then reconstructs the 
missing entries via least-squares estimation if required. The discount factor allows the algorithm capture 
long-term behavior as well as track the changes of the data stream. We prove the convergence properties 
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of PETRELS by revealing its connection with the PAST algorithm in the full observation scenario. We 
demonstrate superior performance of PETRELS in direction-of-arrival estimation and showed that it is 
competitive with state of the art batch matrix completion algorithms. 
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